Skip to content

07_reverseString: handle Unicode surrogate pairs in reverseString solution #550

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

eyad-alkhalidy
Copy link

Because

The original reverseString solution used .split(""), which does not handle complex Unicode characters correctly. This can lead to incorrect behavior when the input includes surrogate pairs (e.g., emojis or some accented characters). This PR modifies the implementation to avoid that issue and includes a comment to make learners aware of the underlying limitation.

This PR

  • Updated the reverseString solution to use Array.from() instead of .split("")
  • Added a comment explaining the issue with surrogate pairs and Unicode handling in JavaScript

Issue

N/A

Additional Information

We've already been exposed to this problem in the MDN Reference page for String, linked here. This is why I think we can consider it in this exercise.

Pull Request Requirements

  • I have thoroughly read and understand The Odin Project Contributing Guide
  • The title of this PR follows the location of change: brief description of change format, e.g. 01_helloWorld: Update test cases
  • The Because section summarizes the reason for this PR
  • The This PR section has a bullet point list describing the changes in this PR
  • If this PR addresses an open issue, it is linked in the Issue section
  • If this PR includes any changes that affect the solution of an exercise, I've also updated the solution in the /solutions folder

@eyad-alkhalidy
Copy link
Author

Even if we don't want learners at this stage to worry about surrogate pairs, I think we should still encourage them to use the more appropriate Array.from(string) method instead of string.split("") when converting a string into an array of characters.

@mao-sz
Copy link
Contributor

mao-sz commented Aug 13, 2025

My 2c would be that since none of the tests involve surrogate pairs or grapheme clusters, split('') is sufficient as a provided solution.

Array.from() is also not a bulletproof solution because while it handles surrogate pairs, it doesn't handle grapheme clusters.

// Neither of these log ["👨‍👦"] - you'd need a custom splitter
console.log("👨‍👦".split(''), Array.from("👨‍👦"));

So if the solution was to dive into or just surface-accommodate surrogate pairs, the same argument can be made for "why not also grapheme clusters?" and that's definitely way out of scope of the intent of this exercise.

The surrogate pair problem isn't something I would expect learners to reasonably face anytime soon anyway, and if they ever do, I'm confident it'd be something they'd be able to figure out from research, experiments and external help.

@eyad-alkhalidy
Copy link
Author

That's a good point about grapheme clusters, and I completely agree with you, @MaoShizhong. Array.from() is not a silver bullet for all Unicode situations, and that diving into grapheme clusters is definitely beyond the scope of this exercise.

My thinking for this change now isn't so much about making the solution perfectly robust for every Unicode edge case, but more about establishing a modern best practice from the start.

While string.split('') works for the specific tests here, it's a pre-ES6 pattern with a common failure case (surrogate pairs). My concern is that by providing it in an official solution, we're implicitly endorsing a method that students will have to unlearn later when their code breaks on a simple emoji.

Array.from(string) is the modern, built-in language standard. Using it is a simple switch that correctly handles a significantly larger and more common set of characters. It requires no extra libraries or complex understanding.

By using Array.from(string), we're not aiming for perfect Unicode compliance, but rather teaching the current, idiomatic JavaScript way to convert a string to an array. It costs nothing in terms of code complexity (in fact, [...string] is even more concise) and provides a "free" upgrade in robustness that will serve learners well.

Ultimately, I feel it's better to equip learners with the modern, more reliable tool from the beginning, even if they don't face its specific benefits within this single exercise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants